353 research outputs found

    The influence of the presence of deviant item score patterns on the power of a person-fit statistic

    Get PDF
    Studies investigating the power of person-fit statistics often assume that the item parameters that are used to calculate the statistics are estimated in a sample without misfitting item score patterns. However, in practical test applications calibration samples likely will contain such patterns. In the present study, the influence of the type and the number of misfitting patterns in the calibration sample on the detection rate of the ZU3 statistic was investigated by means of simulated data. An increase in the number of misfitting simulees resulted in a decrease in the power of ZU3. Furthermore, the type of misfit and the test length influenced the power of ZU3. The use of an iterative procedure to remove the misfitting patterns from the dataset was investigated. Results suggested that this method can be used to improve the power of ZU3. Index terms: aberrance detection, appropriateness measurement, nonparametric item response theory, person fit, person-fit statistic ZU3

    The number of Guttman errors as a simple and powerful person-fit statistic

    Get PDF
    A number of studies have examined the power of several statistics that can be used to detect examinees with unexpected (nonfitting) item score patterns, or to determine person fit. This study compared the power of the U3 statistic with the power of one of the simplest person-fit statistics, the sum of the number of Guttman errors. In most cases studied, (a weighted version of) the latter statistic performed as well as the U3 statistic. Counting the number of Guttman errors seems to be a useful and simple alternative to more complex statistics for determining person fit. Index terms: aberrance detection, appropriateness measurement, Guttman errors, nonparametric item response theory, person fit

    Book review: Test scoring

    Get PDF

    Investigating invariant item ordering in the Mental Health Inventory : an illustration of the use of different methods

    Get PDF
    Invariant item ordering is a property of scales whereby the items are scored in the same order across a wide range of the latent trait and across a wide range of respondents. In the package ‘mokken’ in the statistical software R, the ability to analyse Mokken scales for invariant item ordering has recently been available and techniques for inspecting visually the item response curves of item pairs, have also been included. While methods to assess invariant item ordering are available, there have been indications that items representing extremes of distress in mental well-being scales, such as suicidal ideation, may lead to claiming invariant item ordering where it does not exist. We used the Mental Health Inventory to see if invariant item ordering was indicated in any Mokken scales derived and to see if this was being influenced by extreme items. A Mokken scale was derived indicating invariant item ordering. Visual inspection of the item pairs indicated that the most difficult item (suicidal ideation) was located far from the remaining cluster of items. Removing this item lowered invariant item ordering to an unacceptable level

    The Use of Nonparametric Item Response Theory to Explore Data Quality

    Get PDF
    The aim of this chapter is to provide insight into a number of commonly used nonparametric item response theory (NIRT) methods and to show how these methods can be used to describe and explore the psychometric quality of questionnaires used in patient-reported outcome measurement and, more in general, typical performance measurement (personality, mood, health-related constructs). NIRT is an extremely valuable tool for preliminary data analysis and for evaluating whether item response data are acceptable for parametric IRT modeling. This is in particular useful in the field of typical performance measurement where the construct being measured is often very different than in maximum performance measurement (education, intelligence; see Chapter 1 of this handbook). Our basic premise is that there are no “best tools” or “best models” and that the usefulness of psychometric modeling depends on the specific aims of the instrument (questionnaire, test) that is being used. Most important is, however, that it should be clear for a researcher how sensitive a specific method (for example, DETECT, or Mokken scaling) is to the assumptions that are being investigated. The NIRT literature is not always clear about this, and in this chapter we try to clarify some of these ambiguities

    The crit coefficient in Mokken scale analysis:A simulation study and an application in quality-of-life research

    Get PDF
    PURPOSE: In Mokken scaling, the Crit index was proposed and is sometimes used as evidence (or lack thereof) of violations of some common model assumptions. The main goal of our study was twofold: To make the formulation of the Crit index explicit and accessible, and to investigate its distribution under various measurement conditions. METHODS: We conducted two simulation studies in the context of dichotomously scored item responses. We manipulated the type of assumption violation, the proportion of violating items, sample size, and quality. False positive rates and power to detect assumption violations were our main outcome variables. Furthermore, we used the Crit coefficient in a Mokken scale analysis to a set of responses to the General Health Questionnaire (GHQ-12), a self-administered questionnaire for assessing current mental health. RESULTS: We found that the false positive rates of Crit were close to the nominal rate in most conditions, and that power to detect misfit depended on the sample size, type of violation, and number of assumption-violating items. Overall, in small samples Crit lacked the power to detect misfit, and in larger samples power differed considerably depending on the type of violation and proportion of misfitting items. Furthermore, we also found in our empirical example that even in large samples the Crit index may fail to detect assumption violations. DISCUSSION: Even in large samples, the Crit coefficient showed limited usefulness for detecting moderate and severe violations of monotonicity. Our findings are relevant to researchers and practitioners who use Mokken scaling for scale and questionnaire construction and revision. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11136-021-02924-z
    corecore